Skip to content

[rocprof-compute] Adding Triton backend to marker injection#6901

Merged
ggottipa-amd merged 40 commits into
rocprofiler-compute-developfrom
users/ggottipa-amd/add-triton-backend
Jun 30, 2026
Merged

[rocprof-compute] Adding Triton backend to marker injection#6901
ggottipa-amd merged 40 commits into
rocprofiler-compute-developfrom
users/ggottipa-amd/add-triton-backend

Conversation

@ggottipa-amd

@ggottipa-amd ggottipa-amd commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Motivation

This PR enables operator tracing for Triton workloads.

Technical Details

  1. Wraps Triton's launch entry points — JITFunction.run for eager kernels and CompiledKernel.run/call for compiled/Inductor kernels — to emit a ROCTX marker per launch, attributed to the user's call site.
  2. Adds --triton-trace (and --ml-api-trace for all backends) for profiling and --list-triton-operators / --triton-operator for analysis, driven by a shared backend registry.
  3. Each marker is tagged with its framework so Triton and Torch kernels are analyzed independently.
  4. Marker names are percent-encoded so names containing slashes don't break parsing, with identical encode/decode in the Python tier and the analyzer.
  5. Adds Triton backend unit tests (launch-point wrapping, reentrancy, kernel-name resolution, percent-encode round trip), C++ gtests for encode/decode symmetry, framework-selection and per-backend analyze-filter tests, and a GPU+Triton-gated end-to-end profile-and-analyze test over a sample Triton workload.
  6. Adds two cached profiling workloads: tests/workloads/triton_trace/MI300A and tests/workloads/ml_api_trace/MI300A, so the per-backend analyze tests run CPU-only in CI.

JIRA ID

AIPROFCOMP-635

Test Plan

Triton backend unit tests (wrapping, reentrancy dedup, kernel-name, percent-encode round trip)
ctest -R test_inject_roctx_package

Framework selection (flag -> backend, env var)
ctest -R test_profiler_base

Analyze: per-backend operator filtering
ctest -R test_analyze_workloads

Utils: operator pattern parsing
ctest -R test_utils

C++ gtests: percent-encode + round-trip decode
ctest -R test-roctx-recordfn

End-to-end Triton profile + analyze (requires GPU + Triton)
ctest -R test_profile_triton_trace

Test Result

Tests Pass

Submission Checklist

@ggottipa-amd ggottipa-amd requested review from a team and prbasyal-amd as code owners June 8, 2026 09:28
Copilot AI review requested due to automatic review settings June 8, 2026 09:28
@ggottipa-amd ggottipa-amd requested a review from a team as a code owner June 8, 2026 09:28
@ggottipa-amd ggottipa-amd marked this pull request as draft June 8, 2026 09:28
@github-actions github-actions Bot added documentation Improvements or additions to documentation project: rocprofiler-compute labels Jun 8, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends rocprofiler-compute’s ROCTX marker injection and analysis pipeline to support a Triton backend alongside the existing PyTorch backend, adding new CLI flags, backend-aware operator filtering/listing, and shared Python-tier ROCTX initialization.

Changes:

  • Add Triton tracing/profile CLI support (--triton-trace) and analysis CLI support (--list-triton-operators, --triton-operator) with updated docs.
  • Implement and test a hardened Triton ROCTX injection backend that wraps Triton kernel launch entry points and attributes markers to a Backend column.
  • Generalize analyze-side operator selection/listing to be backend-aware (torch vs triton), including per-backend matched trace storage.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
projects/rocprofiler-compute/tests/test_utils.py Adds tests for backend-aware operator pattern parsing and backend filtering in analysis CLI.
projects/rocprofiler-compute/tests/test_profiler_base.py Adds tests validating CLI framework selection and subprocess env var construction.
projects/rocprofiler-compute/tests/test_inject_roctx_module.py Adds Triton backend unit tests (wrapping behavior, reentrancy guard, python-tier handling, kernel name extraction).
projects/rocprofiler-compute/src/utils/tty.py Generalizes operator listing output to support a configurable framework label (PyTorch/Triton).
projects/rocprofiler-compute/src/utils/schema.py Extends Workload to store matched API-trace rows per backend.
projects/rocprofiler-compute/src/utils/inject_roctx/_core.py Introduces shared Python-tier ROCTX initialization (ensure_python_tier) and candidate path tracking.
projects/rocprofiler-compute/src/utils/inject_roctx/_backends/_triton.py Adds/extends Triton backend wrapping logic for kernel launches and framework-root registration.
projects/rocprofiler-compute/src/utils/inject_roctx/_backends/_torch.py Switches torch backend to use shared Python-tier initialization and shared candidate-path reporting.
projects/rocprofiler-compute/src/rocprof_compute_profile/profiler_base.py Adds Triton framework selection mapping for ROCTX injection env var.
projects/rocprofiler-compute/src/rocprof_compute_analyze/analysis_cli.py Refactors operator list/filter flow to be backend-aware and adds backend filtering on consolidated API traces.
projects/rocprofiler-compute/src/argparser.py Adds CLI flags for Triton tracing and Triton operator listing/filtering.
projects/rocprofiler-compute/docs/how-to/profile/mode.rst Documents Triton trace usage, requirements, and relationship to ROCTX frameworks env var.
projects/rocprofiler-compute/docs/how-to/analyze/cli.rst Documents Triton operator analysis and its CLI usage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread projects/rocprofiler-compute/src/utils/inject_roctx/_backends/_triton.py Outdated
Comment thread projects/rocprofiler-compute/src/rocprof_compute_analyze/analysis_cli.py Outdated
@ggottipa-amd ggottipa-amd force-pushed the users/ggottipa-amd/inject-roctx-multi-framework branch 2 times, most recently from baf2456 to de095d3 Compare June 8, 2026 14:46
@ggottipa-amd ggottipa-amd force-pushed the users/ggottipa-amd/add-triton-backend branch from f2d20b8 to 50aabb3 Compare June 9, 2026 09:01
@ggottipa-amd ggottipa-amd marked this pull request as ready for review June 11, 2026 16:17
@ggottipa-amd ggottipa-amd changed the title Adding Triton backend to marker injection [rocprof-compute] Adding Triton backend to marker injection Jun 12, 2026
@ggottipa-amd ggottipa-amd force-pushed the users/ggottipa-amd/inject-roctx-multi-framework branch from c4ac4c9 to 46559c0 Compare June 15, 2026 06:41
@ggottipa-amd ggottipa-amd requested a review from a team as a code owner June 15, 2026 06:41
@ggottipa-amd ggottipa-amd marked this pull request as draft June 17, 2026 14:24
@ggottipa-amd ggottipa-amd force-pushed the users/ggottipa-amd/inject-roctx-multi-framework branch from 61a6658 to 81ee094 Compare June 18, 2026 11:28
@ggottipa-amd ggottipa-amd force-pushed the users/ggottipa-amd/add-triton-backend branch from 657626d to 7007129 Compare June 18, 2026 12:43
@ggottipa-amd ggottipa-amd marked this pull request as ready for review June 18, 2026 14:28
Comment thread projects/rocprofiler-compute/src/utils/inject_roctx/_backends/triton.py Outdated
Comment thread projects/rocprofiler-compute/src/utils/inject_roctx/_backends/triton.py Outdated
Comment thread projects/rocprofiler-compute/tests/test_inject_roctx_package.py Outdated
@ggottipa-amd ggottipa-amd force-pushed the users/ggottipa-amd/add-triton-backend branch from a88ff64 to 6a8df5c Compare June 19, 2026 09:06
@ggottipa-amd ggottipa-amd force-pushed the users/ggottipa-amd/inject-roctx-multi-framework branch 2 times, most recently from bff659e to b0b3164 Compare June 22, 2026 12:29
@ggottipa-amd ggottipa-amd force-pushed the users/ggottipa-amd/add-triton-backend branch from b72859d to ab4f20e Compare June 22, 2026 14:58
Base automatically changed from users/ggottipa-amd/inject-roctx-multi-framework to rocprofiler-compute-develop June 22, 2026 18:03
@ggottipa-amd ggottipa-amd force-pushed the users/ggottipa-amd/add-triton-backend branch 3 times, most recently from 47add6b to 41d52a1 Compare June 24, 2026 09:27
@ggottipa-amd ggottipa-amd force-pushed the users/ggottipa-amd/add-triton-backend branch from 066804e to 491b779 Compare June 29, 2026 08:02

@xuchen-amd xuchen-amd left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for addressing my comments, I would appreciate another reviewer feedback (a third one in addition to Carrie and myself).

Comment thread projects/rocprofiler-compute/docs/how-to/analyze/cli.rst
Comment thread projects/rocprofiler-compute/docs/how-to/analyze/cli.rst
Comment thread projects/rocprofiler-compute/docs/how-to/profile/mode.rst Outdated

@vedithal-amd vedithal-amd left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do a quick review by EOD here

@vedithal-amd vedithal-amd left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few maintenance-cost notes on the docs plus a coverage-gap flag on the Triton path. See inline.

Comment thread projects/rocprofiler-compute/docs/how-to/analyze/cli.rst
Comment thread projects/rocprofiler-compute/docs/how-to/profile/mode.rst Outdated
Comment thread projects/rocprofiler-compute/docs/how-to/profile/mode.rst
Comment thread projects/rocprofiler-compute/CMakeLists.txt

@vedithal-amd vedithal-amd left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One follow-up note on package naming (see inline).

- C++ build_marker_string: exact reserve() for percent-encoding, extract
  encode_marker_segment + shared escape constants reused by the round-trip test
- analysis_base: collapse per-backend trace-validation into one loop
- rename KNOWN_BACKENDS -> KNOWN_ML_API_BACKENDS
- rename _BACKEND_CLI -> _ML_API_ANALYSIS_CLI_OPTIONS
- utils_analysis: inline decode_marker_name in build_call_trees
- test_utils: drop redundant function-local pandas imports

Co-Authored-By: Claude <noreply@anthropic.com>

@vedithal-amd vedithal-amd left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Pushed one commit (60e1e65) with review nits, all mechanical, no behavior change:

  • C++ build_marker_string: exact reserve() accounting for percent-encoding; extracted encode_marker_segment + shared escape constants, reused by the round-trip test so the escape table has one definition.
  • analysis_base: collapsed the duplicated per-backend trace-validation into a single loop over the backend list.
  • Renamed KNOWN_BACKENDS -> KNOWN_ML_API_BACKENDS and _BACKEND_CLI -> _ML_API_ANALYSIS_CLI_OPTIONS.
  • utils_analysis: inlined decode_marker_name in build_call_trees.
  • test_utils: dropped redundant function-local pandas imports.

Follow-ups left as separate notes (not blocking): rename _backends -> backends, and ML-API coverage test.

@ggottipa-amd ggottipa-amd force-pushed the users/ggottipa-amd/add-triton-backend branch from 9ee8251 to ccd5d6d Compare June 30, 2026 05:50
@ggottipa-amd ggottipa-amd dismissed prbasyal-amd’s stale review June 30, 2026 06:01

Resolved all the issues raised by Pratik. But dismissing since he is OOO this week and can't approve before the end of this sprint. For the sake of unblocking

@ggottipa-amd ggottipa-amd merged commit 7026950 into rocprofiler-compute-develop Jun 30, 2026
27 checks passed
@ggottipa-amd ggottipa-amd deleted the users/ggottipa-amd/add-triton-backend branch June 30, 2026 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants